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We describe a deep-branching lineage of marine Actinobacteria with very low GC content (33%) and the 
smallest free living cells described yet (cell volume ca. 0.013 (im 3 ), even smaller than the cosmopolitan 
marine photoheterotroph, 'Candidates Pelagibacter ubique'. These microbes are highly related to 16S rRNA 
sequences retrieved by PCR from the Pacific and Atlantic oceans 20 years ago. Metagenomic fosmids 
allowed a virtual genome reconstruction that also indicated very small genomes below 1 Mb. A new kind of 
rhodopsin was detected indicating a photoheterotrophic lifestyle. They are estimated to be —4% of the total 
numbers of cells found at the site studied (the Mediterranean deep chlorophyll maximum) and similar 
numbers were estimated in all tropical and temperate photic zone metagenomes available. Their geographic 
distribution mirrors that of picocyanobacteria and there appears to be an association between these 
microbial groups. A new sub-class, 'Candidates Actinomarinidae' is proposed to designate these microbes. 



Actinobacteria were considered to be typical soil dwellers. However, with the advent of the molecular 
approach, 16S rRNA genes indicative of actinobacterial descent were found in the ocean 1 . Later, more 
sequences retrieved from marine habitats could be more specifically connected to the cultivated actino- 
bacterium Candidatus Microthrix parvicella and were designated as the OM1 clade 24 . Moreover, rRNA genes 
that were identified as Actinobacteria were also found in significant numbers in lakes and other freshwater 
habitats 5,6 . The diversity of freshwater Actinobacteria turned out to be very broad with several groups described 
based only on 16S rRNA analyses, distributed over two orders (Actinomycetales and Acidimicrobiales) 610 . Using 
fluorescence in situ hybridization (FISH) and examination of enrichment cultures it was concluded that these 
aquatic Actinobacteria were very small in size (biovolume <0.1 um 3 ) and very abundant in oligotrophic fresh- 
waters 7,8 . Recently, by using metagenomic approaches, aquatic Actinobacteria were shown to be low GC (mol% 
GC of genomic DNA 40-50%) compared to their high GC soil relatives 1 1,12 . The only other previously known low 
GC Actinobacteria were pathogens of the genus Gardnerella 11 . The higher surface:volume ratio of freshwater 
Actinobacteria likely improves their survival chances at the very low-nutrient concentrations found in oligo- 
trophic freshwater bodies 13,14 . Two genomes of low-GC Actinobacteria are now available, one from a lake in 
Wisconsin, USA, determined using single-cell genomics 15 (GC content 42%) and another using a culture based 
approach 16 (GC 51.7%). Both of these organisms are photoheterotrophs, possessing rhodopsins (actinorhodop- 
sins) to harvest light energy. 

In addition, metagenomic studies, including the Global Ocean Sampling (GOS), KM3 station at the bath- 
ypelagic zone and the deep chlorophyll maximum (DCM) in the Mediterranean sea found sequences that could 
be classified as actinobacterial 17-19 . However, the absence of long scaffolds in which phylogenetically informative 
genes appear linked to significant fragments of their genomes has prevented a reliable assessment of their 
diversity, phylogenetic placement and genomic features. 

Using a combination of metagenomics, flow cytometry and FISH, we describe here a widely distributed novel 
clade of marine Actinobacteria that have the lowest GC content reported so far as well as the smallest cells found 
among free-living prokaryotes. We propose the creation of a new sub-class 'Candidatus Actinomarinidae' to 
denominate this group of microbes. 
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Results 

Ribosomal rRNA phylogeny. The deep chlorophyll maximum (DCM) 
is a section of the photic zone water column, in stratified temperate or 
tropical oligotrophic ocean waters, where most of the photosynfhetic 
activity takes place 17,20 . We have sequenced a large number of 
metagenomic fosmids from the Mediterranean DCM (MedDCM; see 
Methods). Fosmids provide discrete, natural contigs that can be 
efficiently assembled to obtain genomic fragments from all members 
in the community, even from those that are less prevalent and less 
accessible to direct sequencing. During a search for rRNA genes in 
the assembled contigs, we identified two nearly complete rRNA 
operons classified as actinobacterial by the 16S rRNA Ribosomal 
Database Project (RDP, http://rdp.cme.msu.edu) classifier 21 and the 
23S rRNA SILVA large subunit (LSU, http://www.arb-silva.de) 
database 22 (MedDCM-OCT-S38-C68 and MedDCM-OCT-S40-C95). 



Surprisingly, the GC content of both of these rRNA containing contigs 
(33% and 32%), was far lower even than the recently described low GC 
freshwater Actinobacteria (GC% 42) lu2 ' 15 . Both contigs were syntenic 
to each other and showed high sequence similarity. Additionally, we 
identified another contig (MedDCM-OCT-S43-C55) (GC% 29.6) that 
overlapped with both rRNA-containing contigs, extending the recon- 
structed genomic fragment (Fig. la). A careful inspection indicated that 
the majority of genes in these contigs were similar to genes in 
actinobacterial genomes, providing additional evidence of their 
affiliation to this group. 

We examined whether similar sequences had been assembled 
before by searching the 16S rRNA gene in the entire collection of 
assembled scaffolds from the GOS dataset 19 . This way, 13 GOS scaf- 
folds were retrieved using a stringent cut-off of 98% nucleotide iden- 
tity over 97% of the 16S rRNA gene sequence (species threshold 
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Figure 1 | (a), Comparison of marine low GC Actinobacterial contigs containing rRNA genes to scaffolds from the Global Ocean Sampling (GOS) dataset 
(using BLASTN). The oceanic habitat (C-Coastal, CRA-Coral Reef Atoll, O-Open Ocean, E-Estuary), sampling locations (NAEC: North American East 
Coast, GI-Galapagos Islands, ETP-Eastern Tropical Pacific, PA-Polynesia Archipelagos) and the GOS dataset identifier are shown next to each GOS 
scaffold. Numbers in brackets indicate additional identical sequences found at the same location. All ribosomal RNA genes are highlighted in color and 
sequence identity amongst the contigs is shown in shades of grey (see color scale), (b), 16S rRNA phylogeny. 16S rRNA gene sequences from the assembled 
contigs and GOS scaffolds in the context of the entire Actinobacteria phylum, with Firmicutes as the outgroup. Actinobacterial Sub-Classes are in bold 
uppercase and Orders in bold italics. Sub-orders are shown in different colors in the tree and labeled (key is shown on bottom right). Freshwater 
actinobacterial clades are additionally marked with an asterisk. 'Ca. Microthrix parvicella', to which the Actinobacteria OM1 clade is related, is marked 
with a blue star. The novel branch with sequences attributed to sub-class ' Candidatus Actinomarinidae' is shown in red. Bootstrap values (shown as 
percentages) for all major branches are shown in colored circles (see key bottom left). 
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level), and an additional 25 at >95% identity at 95% coverage. Even 
the comparison between the 16S-23S rRNA intergenic spacer region 
(ITS) of our contigs and those of the GOS indicated a high degree of 
conservation of these rRNA operons (Fig. SI). Most GOS scaffolds 
were short and contained only the rRNA operon, but some also 
presented a few more genes, which were remarkably syntenic to 
our contigs, although at a lower sequence identity (Fig. la). It is also 
interesting to note that the GOS scaffolds were all from temperate or 
tropical regions but geographically very distant from each other (e.g. 
Gulf of Panama, Equatorial Pacific). Moreover, we also found 250 
sequences in 16S rRNA clone libraries 21 (%identity >98% and cov- 
erage >98% of complete gene). These results independently confirm 
the genuine nature of our assembled contigs and show that they 
originate from a widely distributed group of ultra-low GC 
Actinobacteria only known through their 16S rRNA sequences. 

We generated maximum-likelihood trees for the alignments of 
16S, 23S rRNAs and wherever possible to improve phylogenetic 
resolution, a concatenated alignment of both 16S and the 23S, in 
context of all known Actinobacteria (Fig. lb, Fig. S2 and Fig. S3). 
All three analyses produced consistent results and unambiguously 
placed the rRNA sequences from the ultra-low GC Actinobacteria as 
a deep branching lineage, divergent enough to be a new subclass 
within the phylum. In one of the earliest studies using PCR amp- 
lification of the 16S rRNA gene performed in the Pacific and the 
Atlantic Oceans 1 a few deeply branching sequences belonging to 
Gram-positive bacteria were discovered, some of which were nearly 
identical to each other, even though they came from sampling sites 
that were quite far apart. This lineage was again recovered from the 
Sargasso sea, and described as the marine Actinobacterial clade 23 . 
Subsequent studies also confirmed the presence of another actino- 
bacterial group (also referred to as Actinobacterial clade OM1) and 
estimated their abundance in the range of 1-5% of the total com- 
munity 2,3 . Our analysis of all these short 16S rRNA sequences in the 
previous surveys indicates that these previously obtained sequences 
belong to two different groups. The Actinobacterial OM1 group has 
been previously recognized to be related to 'Candidatus Microthrix 
parvicella' 2,3 , and all the sequences in this group belong to the order 
Acidimicrobiales. However, sequences from the first two surveys 1,23 
are related to the sequences retrieved by our metagenomic fosmids, 
and belong in an independent well defined clade. Therefore, with 
additional evidence of the complete 16S and the 23S genes at 
hand, we propose the creation of the new sub-class, 'Candidatus 



Actinomarinidae', (order 'Ca. Actinomarinales', sub-order 'Ca. 
Actinomarineae', family 'Ca. Actinomarinaceae') for the taxonomic 
placement of this group of microbes. 

FISH hybridization and flow cytometry. As another completely 
independent way to verify the presence and abundance of these 
new Actinobacteria in the MedDCM, we used the 16S rRNA gene 
sequence to design a lineage-specific probe (LGC722; Table SI) and 
visualize them directly by FISH 24 (see Methods). The cells labeled 
with this probe were extremely small, even compared to Prochloro- 
coccus cells that are less than ~1 um in diameter (Fig. 2a-d). Image 
analysis indicates that the cells are probably spherical, and are among 
the smallest free-living marine microbes identified to date. Analysis 
of the size spectrum of bacterioplankton from MedDCM samples by 
combined flow cytometry-FISH techniques (Fig. 2e) gave biovolume 
estimations for the cells matching the lineage-specific probe ranging 
between 0.006-0.024 um 3 (±SD 0.006 um 3 ) and an average 
diameter of 0.292 um (±SD 0.044 urn). Assuming a spherical 
shape, the average cell volume calculated was only —0.013 um 3 . 
This extremely low biovolume is by far the lowest described for 
any planktonic prokaryote thus far (Table S2) 25 ~ 3S . In comparison, 
'Candidatus Pelagibacter ubique', considered the smallest 
autonomously replicating free-living cell, has a volume ranging 
from 0.019 to 0.039 um 3 37 . Microscopy abundance estimates from 
the fluorescently labeled cells indicated that they comprised nearly 
4% of total bacterioplankton (~5 X 10 3 cell ml -1 ) and represented 
—80% of the cells hybridizing with a general actinobacterial probe 
(HGC236; Table SI). Given their extremely small size, we propose 
the taxonomic name 'Candidatus Actinomarina minuta' for these 
microbes. 

Genome reconstruction. For a better understanding of the lifestyle 
of the ultra-small Actinobacteria, we identified more assembled 
contigs from our MedDCM metagenomic fosmids that could 
belong to this group. In addition to the strict criteria employed for 
selection (see Methods), all contigs were manually examined. 
Moreover, a tight clustering of these contigs was revealed by 
principal component analysis (PCA) of tetranucleotide frequencies 
indicating that they likely belong to highly related microbes 
(probably at the level of the same genus; Fig. S4). This method of 
studying genomes retrieved from metagenomic datasets has been 
shown to work very well previously 13,39 . We were able to retrieve 
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Figure 2 | (a-d), Microscopic fluorescence in situ hybridization (FISH) image of samples from the Mediterranean deep chlorophyll maximum 
(MedDCM). The micrographs show two pairs of identical microscopic fields, with samples stained with DAPI (left) and with the new lineage specific low 
GC Actinobacteria probe (LGC722) labeled with Cy3 (right). Yellow arrows (left) indicate autofluorescent Prochlorococcus, and white arrows (right) mark 
LGC722 signal also detected by DAPI. Bar: 10 um (all four panels) . (e), Abundance and bacterial structure size by flow cytometry. The size structure of the 
heterotrophic bacterioplankton population is shown. Size distribution of targeted Actinobacteria according to FISH measurements is shown in black. 
Note that the left tail of the size distribution is mostly due to instrumental noise and not due to bacterioplankton size. 



SCIENTIFIC REPORTS | 3 : 2471 | DOI: 1 0. 1 038/srep02471 



3 



43 contigs (longest 45.6 kb, shortest 7.3 kb, median GC 33.4%), 
which can be treated as a virtual (if incomplete) genome (Fig. 3a). 
We identified several overlapping contigs, but it is important to 
emphasize that a wide variation in the degree of relatedness was 
found among the overlaps (Fig. 3b). While some contigs were 
nearly identical at nucleotide level, others showed the average nucle- 
otide identity expected for members of different species within a 
genus. Synteny was largely preserved in all cases of overlapping 
contigs, suggesting that multiple lineages of these microbes are 
present concurrently at the same location. The combined length of 
these 43 contigs is 1317 kb and once coalesced they span only 
—700 kb ( — 800 genes). We analyzed the contigs for the presence 
of 35 orthologous markers defined previously 40 to estimate the 
completeness of the recovered virtual genome. Identification of 30 
of these markers indicated 85% genome recovery. Another estimate 
using the core genes of all complete actinobacterial genomes 
suggested that 68% of the genome was recovered. Taken together, 
they result in an expected, but still remarkably small, genome size in 
the range of 823-1029 kb (Fig. 3a). Moreover, the median length of 
intergenic spacers was 3 bp comparable only to 'Candidatus 
Pelagibacter ubique' 41 , confirming a highly streamlined genome 
(Fig. S5). 



Comparison of the reconstructed genome with the only sequenced 
freshwater low-GC actinobacterial (acl cluster) genome 15 did not 
show any conserved synteny. However, they shared 418 orthologous 
genes (albeit with low average similarities, —57%), a remarkably high 
proportion considering how phylogenetically distant the two 
microbes are (Fig. lb). There were also a number of surprising par- 
allels between the two genomes. Both microbes are putative photo- 
heterotrophs containing rhodopsins. Rhodopsins are known to be 
important for light- harvesting in the photic zone of all aquatic envir- 
onments 42,43 . We identified two rhodopsin-containing contigs 
(Fig. 3c) and retrieved 27 additional sequences (%similarity >95%, 
gene coverage 90%) from the GOS dataset (14 in scaffolds and 13 in 
metagenomic reads) (Fig. S6). These rhodopsins are distantly related 
to all other rhodopsins known so far, forming a novel branch in the 
phylogenetic tree (Fig. 3d). We suggest the name MACrhodopsins 
(Marine actinobacterial clade rhodopsins) for this new clade. It is 
quite likely that these rhodopsins are used as a supplementary energy 
source to their main chemoheterotrophic metabolism as shown for 
other marine microbes 37,44 . The rhodopsin flanking genes in these 
metagenomic contigs were also conserved, a photolyase, common in 
organisms exposed to light, and a thiol-disulfide reductase, also 
linked to the rhodopsin gene in 'Candidatus Pelagibacter' 41 (Fig. 3c 
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Figure 3 | (a), Linear representation of ' Candidatus Actinomarina' contigs showing their overlaps. Estimates of the genome size based on different 
indicators are shown to the right with some reference small genome sizes. Two groups of contigs are highlighted in grey and are shown in greater detail in 
the panels below, (b), Multiple, highly related lineages. A group of contigs with overlaps indicating nucleotide identity (BLASTN, top) and translated 
protein identity (TBLASTX, below). A color scale is shown below, (c), Synteny amongst two rhodopsin containing contigs. The rhodopsin gene is shown 
in red. Overlaps are colored according to the color scale as shown (comparison performed with TBLASTX). (d), Marine Actinobacterial Clade 
Rhodopsins (MACrhodopsins). A maximum likelihood tree of all known types of rhodopsins is shown. The number of sequences in each clade of 
rhodopsins is indicated in brackets. 29 sequences from several Global Ocean Sampling (GOS) datasets were also identified using the novel sequences from 
the Mediterranean deep chlorophyll maximum and are part of the MACrhodopsin clade. Bootstrap values (shown as percentages) are indicated by circles 
(see key on bottom right). 
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and Fig. S6). Analysis of the critical amino acids determining wave- 
length selection for light absorption 43 indicated that they absorb light 
in the green region of the visible spectrum. Green tuned rhodopsins 
are correlated with highly-productive marine environments 45 , such 
as coastal waters and the DCM. Genes involved in beta-carotene 
biosynthesis, e.g. geranylgeranyl diphosphate (GGPP) synthase and 
geranylgeranyl diphosphate reductase, were also found. Another 
interesting parallel with the acl genome available was the presence 
of a cyanophycinase. Cyanophycin is an amino acid polymer used as 
carbon and nitrogen storage material by several Cyanobacteria e.g. 
Synechococcus 46 . 

Other general metabolic pathways associated with aerobic life 
were shared by the two microbes such as several components of 
the TCA cycle, glycolysis, pentose phosphate pathway, superoxide 
dismutase and cytochrome c. No flagellar genes were present in 
either genome. Some other actinobacterial specific genes, e.g. for 
mycothiol biosynthesis and coenzyme F420-dependent enzymes, 
were also present in both genomes. On the other hand, some specific 
marine adaptations were found in 'Ca. Actinomarina', including a 
phosphotransferase sugar transport system (PTS). PTS systems 
can transport several sugars, as well as N-acetyl glucosamine 47 , 
which is widely available in the sea. Also consistent with the marine 
habitat was the presence of several Na + symporters (Na + /H + , 



Na + /bile acid, Na + /phosphate) and operons for the uptake of phos- 
phate and phosphonate (the Mediterranean sea being a phosphate- 
limited habitat). 

Biogeography and ecology. We examined the worldwide distribution 
of 'Ca. Actinomarina' using the 16S rRNA as a probe in several 
metagenomic datasets and also in the entire Ribosomal Database 
Project (RDP) 21 (see Methods) using extremely stringent cut-offs 
(Fig. 4a, Fig. S7). It appears that the representatives of this group 
are widely distributed in the photic zone of the ocean, both in the 
tropical and temperate belt, not unlike the distribution of picocyano- 
bacteria, particularly Synechococcus 4 ". This distribution is also well 
supported by the high number of reads recruited at very high 
similarity at both central North Pacific and North Atlantic gyres 
(Hawaii Ocean Time Series-HOTS and Bermuda Atlantic Time 
Series-BATS metagenomes 4 "' 50 ) (Fig. 4b). However, like the picocy- 
anobacteria, they are prominently absent from polar regions and from 
meso or bathypelagic depths (Fig. 4a and Fig. S7). Further evidence of 
their preferential abundance in the photic zone is seen in HOTS and 
BATS metagenomic depth-profiles reinforcing their absence in 
deeper waters (Fig. 4c). The abundance of 'Ca. Actinomarina' along 
the depth profile remarkably mirrors that of Synechococcus. Along 
these lines, Synechococcus is known to produce cyanophycin 46 while in 
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Figure 4 | (a), Worldwide distribution of i6S ribosomal rRNA of 'Ccmtfefcus Actinomarina'. Several metagenomes and the Ribosomal Database Project 
(RDP) database were examined. Locations where the 16S rRNA gene of ' Candidatus Actinomarina' was detected in the RDP database (%identity >98% 
and coverage >98% of complete gene) are shown in circles shaded according to the number of sequences (see key on the right). The number of reads 
detected in several metagenomes (GOS Open Ocean, Coastal, Coral Reef, Estuary, Warm Seep) are shown in percentages of total rRNA reads (%identity 
>98% and coverage 98% of metagenomic read) (see key on the right). Also shown (in white squares) are locations where no reads were detected. The 
world map shown here is a modified version of a freely available map made with Natural Earth at www.naturalearthdata.com. (b), Fragment recruitment. 
Metagenomic reads recruited (TBLASTX) by the 'Candidatus Actinomarina' contigs in three metagenomes, the Mediterranean deep chlorophyll 
maximum (DCM), BATS and HOTS, (c), Depth profile. Percentage of metagenomic reads assigned to ' Candidatus Actinomarina' genome in a depth 
profile of the HOTS and the BATS stations in comparison to Synechococcus. 
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Prochlorococcus this storage material seems to be absent as our search 
for cyanophycin-synthetase in all available Prochlococcus genomes did 
not reveal any such gene . The presence of the cyanophycinase gene 
also supports the Synechococcus- Actinomarina connection. 

Discussion 

The existence of new groups of aquatic Actinobacteria has been 
known for some time, but the difficulty in isolating these microbes 
in pure culture has hampered the advancement of knowledge about 
them. Single cell genomics has been used to describe the genome of 
one acl representative 15 . Here we have used metagenomic fosmids to 
partially reconstruct the genomes of uncultured marine Actinobac- 
teria. The reconstruction of genomes from metagenomes is extre- 
mely unreliable mostly due to the high intraspecies diversity that is 
characteristic of most prokaryotes. Similar observations have been 
made for the recently described Group II Euryarchaeota virtual gen- 
ome assembled from metagenomic data 51 . However, the large contigs 
provided by fosmids allow the inference of many properties of the 
microbes represented by them. The access to complete rRNA oper- 
ons has allowed a refined phylogenetic placement of the microbes 
and the proposal of a new taxon at the subclass level. Besides, com- 
plete sequences allowed the development of FISH probes that pro- 
vided independent confirmation of the presence and abundance of 
these microbes at a typical off-shore marine habitat. The DCM is one 
of the most characteristic ecological features of the stratified marine 
water column representing the most productive segment of the pho- 
tic zone. 

The actinobacterial cells characterized here are among the smallest 
free living cells described to date and fit very well with the character- 
istics of the typical photoheterotrophic cells that inhabit the pelagic 
niche of the oligotrophic ocean. The highly streamlined genome and 
the presence of rhodopsins that allow the cells a photoheterotrophic 
metabolism are common characteristics of the typical inhabitants of 
this niche. 

Thus far, all the abundant aquatic Actinobacteria found appear to 
belong to two orders, the Acidimicrobiales, found mostly in fresh- 
water but also in marine habitats (this is the most probable affiliation 
of the OM1 clade) or the 'Ca. Actinomarinales'. Further work of 
genome reconstruction coupled to single cell genomics or (ideally) 
to the retrieval in pure culture of one or more representatives will 
allow a better understanding of this remarkable group of marine 
prokaryotes, which considering their widespread presence might 
have an important role in the global carbon cycle. 

Methods 

Sequencing, assembly and annotation. DNA from —6000 fosmids (each fosmid 
—40 kb) was extracted and pooled in 24 batches, with —250 fosmids in each batch. 
These were sequenced using Illumina PE 300 bp reads (HiSeq 2000, Macrogen, South 
Korea) in a single lane {total output 42 Gb) which was expected to provide nearly 
— 175X coverage for each fosmid. Sequences were quality trimmed and vector 
sequences were clipped. Assembly was performed separately for each batch using 
Velvet 52 and gene predictions on the assembled fosmids were done using Prodigal in 
metagenomic mode 53 and tRNAs were predicted using tRNAscan-SE 54 . Ribosomal 
genes were identified using ssu-align 55 and meta_rna 56 . Functional annotation was 
performed by comparison of predicted protein sequences against the NCBI NR 
database (available from ftp://ftp.ncbi.nih.gov/blast/db/) and domain predictions for 
the fosmids described in this work were performed manually using NCBI-CD 
search 57 and the HHpred server 58 . Local BLAST searches against the latest NCBI-NR 
database were performed whenever necessary. Tetranucleotide frequencies were 
computed using the wordfreq program in the EMBOSS package 59 . Principal 
components analysis was performed using the FactoMineR package in R 60 . 

Phylogenetic analysis. Reference 16S rRNA sequences for all major actinobacterial 
lineages defined using 178 type strains, all known lineages of uncultured freshwater 
Actinobacteria (72 sequences), the closest BLAST hits to the Mediterranean 
actinobacterial sequences to the RDP database (available from http:// 
rdp.cme.msu.edu/) (27 sequences) and the GOS dataset (available from http:// 
camera.calit2.net/) (255 sequences) were collected to examine the phylogenetic 
relatedness of the low GC actinobacterial sequences. All sequences were screened and 
trimmed using ssu-align 55 . Only sequences more than 800 bp in length were retained. 
Sequences were aligned using MUSCLE 6 ' and a maximum likelihood tree was 



constructed using FastTree2 62 using GTR + CAT model and a gamma 
approximation. Bootstrapping (1000 bootstraps) was done using the seqboot 
program in the PHYLIP package 63 . Assembled site-specific GOS scaffolds were 
screened for the presence of 16S genes and a stringent cut-off of >98% identity and 
>800 bp length was used to select scaffolds that belonged to the same lineage as the 
Mediterranean actinobacterial 16S sequences assembled from the fosmids. In 
addition, alignments were constructed using 16S rRNA secondary structure aware 
ssu-align 55 and phylogenetic trees were reconstructed. Similar results were obtained 
as above. For the rhodopsin tree, sequences were selected based on existing literature, 
PFAM domain searches, and BLAST searches against NCBI-NR and the GOS dataset 
metagenomic reads. Sequences were aligned using MUSCLE 61 and a maximum 
likelihood tree was constructed with RAxML 64 , using a JTT model a gamma 
approximation with 100 rapid bootstrap inferences. 

Proteome comparison to freshwater Actinobacteria. Owing to the occurrence of 
several overlaps in the 43 actinobacterial contigs, some genes were represented more 
than once. Prior to comparison with the acl genome, the 1452 proteins from the 43 
actinobacterial contigs were clustered using USEARCH 65 at 90% identity. The 
clustering resulted in a smaller dataset of 1177 proteins, representing a non- 
redundant proteome of the marine Actinobacteria. This set was compared to the 1244 
proteins of the acl genome using a reciprocal best blast hit analysis to identify 
orthologs. Of these 1177 marine actinobacterial genes, 418 genes were found to be 
orthologous to the freshwater actinobacterial genes. 

Genome size estimation. Genome size was estimated by two methods. First, a set of 
previously described 35 orthologous gene markers 40 was used. We were able to 
identify 30 of these genes in the 43 contigs. This suggests that the genome was 85% 
complete. In the second method, 4203 TIGRFAMs (available from ftp://ftp.jcvi.org/ 
pub/data/TIGRFAMs/) were searched in all known complete actinobacterial 
genomes (n — 232). A set of 71 TIGRFAMs was identified in all known 
Actinobacteria, forming a core set of genes. This core set of genes was tested against 
the nearly complete genome of the freshwater actinobacterium SCGC AAA027-L06, 
which was estimated to be 97.5% complete by using 138 complete actinobacterial 
genomes. We found 69 core TIGRFAMs in this genome, providing an estimate of 
97.1%, consistent with the previous estimate. The 43 contigs of 'Ca. Actinomarina' 
contained 48 core TIGRFAMs, indicating that 67.6% of the genome was recovered. 

Metagenomic recruitment. Recruitments were performed using TBLASTX 66 , and a 
hit was considered only when it was at least 50 amino acids (aa) long with an e-value 
< — le — 5. For estimating the abundance of 'Candidatus Actinomarina', 
Synechococcus, Prochlorococcus and 'Candidatus Pelagibacter' in the HOTS (25 m, 
75 m, 110 m, 500 m, 4000 m) and BATS (20 m,50 m,100 m) datasets of depth 
profiles, the entire metagenomic datasets (for each depth) were compared to a 
customised NR protein database to which the 'Ca. Actinomarina' proteins were added 
(BLASTX). Only the best hits with an evalue < = le — 5 and at least 50 aa length were 
considered towards the calculations of abundance for each taxon. 

16S ribosomal rRNA search across metagenomic datasets. The complete 16S rRNA 
gene sequence of 'Ca. Actinomarina' was used as a probe to identify related sequences 
across several marine metagenomic datasets e.g. the GOS dataset 19 , the 
Mediterranean DCM dataset' 7 , Arctic Metagenome (NCBI SRA accession 
ERR071289), Puerto Rico Trench Metagenome 67 , Antarctica transect metagenome 68 , 
HOTS datasets 50 , and BATS datasets 49 . In addition, the entire RDP 2 ' was also searched 
to identify previously sequenced relatives. 16S rRNA gene sequences of all sequenced 
Prochlorococcus, Synechococcus and 'Ca. Pelagibacter' genomes were used as controls. 

16S ribosomal rRNA comparison with known marine actinobacterial sequences. 

All short 16S rRNA gene sequences described previously in surveys of actinobacterial 
diversity'" 3 ' 23 were obtained from GenBank and were aligned to the reference 
actinobacterial 16S alignment using a phylogeny aware read -alignment 69 and 
placement on the reference actinobacterial tree using an evolutionary placement 
algorithm 70 . Moreover, sequence identities to the reference sequences indicated that 
the Actinobacterial OM1 clade always had >95% identity along their entire length to 
sequences belonging to the order Acidimicrobiales. 

FISH and bacterial size structure. For microscopic counts of autotrophic 
picoplankton and heterotrophic bacterioplankton, water samples were fixed with a 
paraformaldehyde: glutaraldehyde solution to a final concentration (w/v) in the 
sample of 1% : 0.05% (w/v) 71 . Once in the laboratory, subsamples of 5-10 ml were 
filtered through 0.2 urn pore size black filters (Nuclepore 1M ) (Whatman) at low 
pressure (< 100 mbar). For the autotrophic picoplankton (0.2-2.0 um), a quarter of a 
filter was directly inspected under an inverted Zeiss III RS epifluorescence microscope 
(1250X, resolution 0.02857 um/pixel) (Zeiss), and cells classified as prokaryotes or 
photosynthetic eukaryotes depending on their autofluorescence characteristics, 
shape, cell size and the presence of chloroplasts. For heterotrophic bacterioplankton 
quantification was made on another quarter of the filter that was stained with 4', 6- 
diamidino-2-phenylindole (DAPI) 72 (Sigma) and counted with the same microscope 
(1250X). Autofluorescence and DAPI-generated fluorescence were determined by 
using a standard filter set for green and blue light excitation 73 . 

For FISH detection of Actinobacteria, water samples were fixed with a para- 
formaldehyde 4% 1 : 1 to 2% final concentration and filtered within the next two 
hours. We used a general probe HGC236 7 (we discarded HGC664 and HGC840 for 
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the high mismatch with our Actinobacteria) and a new probe specifically designed for 
the targeted low GC Actinobacteria (Supplementary Table Si). For the design of the 
specific probes the Primer3 tool was used 74 . Four different oligonucleotide probes 
were constructed and tested; only LGC722 was used after checking for its specificity 
with the RDP 21 . All probes used were labeled with the indocarbocyanine dye Cy3 
(Thermo Scientific, Waltham, MA, USA). FISH was performed on white polycar- 
bonate filter (0.2 um) sections with the different oligonucleotide probes, also stained 
with DAPI, and mounted for microscopic evaluation. The protocol was performed as 
described in Sekar et al. 7S . Hybridization conditions for the probe LGC722 were 
adjusted by formamide (VWR BDH Prolabo) series applied to different subsamples. 
A minimum of 500 DAPI and probe-stained cells were measured per sample in an 
inverted Zeiss III RS epifluorescence microscope with the adequate set of filters. 
Absolute densities of hybridized bacteria were calculated as the product of their 
relative abundances on filter sections (percentage of DAPI-stained objects) and the 
DAPI-stained direct cell counts. Images from FISH were analyzed using NIH ImageJ 
Software to determine cell dimensions for a minimum of 500 cells (http://rsb.info.- 
nih.gov/ij/index.html). The biovolume of coccoid Actinobacteria was calculated as a 
sphere. 

For cytometric identification, quantification and size structure approximation 76 of 
the bacterioplankton and autotrophic picoplankton (APP) cells, a Coulter Cytomics 
FC500 flow cytometer (Brea, California, USA) equipped with an argon laser (488 
excitation), a red emitting diode (635 excitation), and five filters for fluorescent 
emission (FL1-FL5), was used. Bacterioplankton abundance and size structure was 
determined with argon laser by green fluorescence (Sybr Green I, Sigma- Aldrich, 
Missouri, USA) using a FL1 detector (525 nm). APP abundance was determined by 
combining the argon laser and red diode with red fluorescence (Chlorophyll a and 
phycobiliproteins autofluorescence) using a FL4 detector (675 nm). For size cal- 
ibration, beads (polystyrene fluoro spheres) of different sizes were measured 
(0.79 urn, 1 urn, 4.9 um and 10 urn). In addition, Prochlorococcus cells were also 
used as controls. The lower and upper size limits of measurement are 0.25 urn to 
40 um respectively. The measured diameter of l Ca. Actinomarina' cells is 0.29 um, 
which is at the lower end of the scale. 
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